seo

Search Engine Algorithm Basics

Ali JalilPour June 30, 2024

0 0 12 minutes read

A good search engine does not attempt to return the pages that best match the input query. A good search engine tries to answer the underlying question. If you become aware of this you’ll understand why Google (and other search engines), use a complex algorithm to determine what results they should return. The factors in the algorithm consist of “hard factors” as the number of backlinks to a page and perhaps some social recommendations through likes and +1′ s. These are usually external influences. You also have the factors on the page itself. For this the way a page is build and various page elements play a role in the algorithm. But only by analyzing the on-site and off-site factors is it possible for Google to determine which pages will answer is the question behind the query. For this Google will have to analyze the text on a page.

In this article I will elaborate on the problems of a search engine and optional solutions. At the end of this article we haven’t revealed Google’s algorithm (unfortunately), but we’ll be one step closer to understand some advice we often give as an SEO. There will be some formulas, but do not panic. This article isn’t just about those formulas. The article contains a excel file. Oh and the best thing: I will use some Dutch delights to illustrate the problems.

Behold: Croquets are the elongated and bitterballen are the round ones 😉

True OR False
Search engines have evolved tremendously in recent years, but at first they could only deal with Boolean operators. In simple terms, a term was included in a document or not. Something was true or false, 1 or 0. Additionally you could use the operators as AND, OR and NOT to search documents that contain multiple terms or to exclude terms. This sounds fairly simple, but it does have some problems with it. Suppose we have two documents, which consist of the following texts:

Doc1:
“And our restaurant in New York serves croquets and bitterballen.”

Doc2:
“In the Netherlands you retrieve croquets and frikandellen from the wall.”

Oops, almost forgot to show you the frikandellen 😉

If we were to build a search engine, the first step is tokenization of the text. We want to be able to quickly determine which documents contain a term. This is easier if we all put tokens in a database. A token is any single term in a text, so how many tokens does Doc1 contain?

At the moment you started to answer this question for yourself, you probably thought about the definition of a “term”. Actually, in the example “New York” should be recognized as one term. How we can determine that the two individual words are actually one word is outside the scope of this article, so at the moment we threat each separate word as a separate token. So we have 10 tokens in Doc1 and 11 tokens in Doc2. To avoid duplication of information in our database, we will store types and not the tokens.

Types are the unique tokens in a text. In the example Doc1 contains twice the token “and”. In this example I ignore the fact that “and” appears once with and once without being capitalized. As with the determination of a term, there are techniques to determine whether something actually needs to be capitalized. In this case, we assume that we can store it without a capital and that “And” & “and” are the same type.

By storing all the types in the database with the documents where we can find them, we’re able to search within the database with the help of Booleans. The search “croquets” will result in both Doc1 and Doc2. The search for “croquets AND bitterballen” will only return Doc1 as a result. The problem with this method is that you are likely to get too much or too little results. In addition, it lacks the ability to organize the results. If we want to improve our method we have to determine what we can use other then the presence / absence of a term in a document. Which on-page factors would you use to organize the results if you were Google?

Zone Indexes
A relatively simple method is to use zone indexes. A web page can be divided into different zones. Think of a title, description, author and body. By adding a weight to each zone in a document, we’re able to calculate a simple score for each document. This is one of the first on page methods search engines used to determine the subject of a page. The operation of scores by zone indexes is as follows:

Suppose we add the following weights to each zone:

Zone	Weight
title	0.4
description	0.1
content	0.5

We perform the following search query:
“croquets AND bitterballen”

And we have a document with the following zones:

Zone	Content	Boolean	Score
title	New York Café	0	0
description	Café with delicious croquets and bitterballen	1	0.1
content	Our restaurant in New York serves croquets and bitterballen	1	0.5
		Total	0.6

Because at some point everyone started abusing the weights assigned to for example the description, it became more important for Google to split the body in different zones and assign a different weight to each individual zone in the body.

This is quite difficult because the web contains a variety of documents with different structures. The interpretation of an XML document by such a machine is quite simple. When interpreting an HTML document it becomes harder for a machine. The structure and tags are much more limited, which makes the analysis more difficult. Of course there will be HTML5 in the near future and Google supports microformats, but it still has its limitations. For example if you know that Google assigns more weight to content within the tag and less to content in the

Ali JalilPour June 30, 2024

0 0 12 minutes read

What Big Brands Need to Know About Google’s Filter

How to Get Your Guest Post Request Thrown in The Trash Instantly

Map Your Keywords to the Buyer’s Journey and User Intent — Whiteboard Friday

What Really Earns Loyalty in the Local Business World?

Page Authority 2.0 Is Coming This Month: What’s Changing and Why

The Link Building Webslog

Not-Actually-the-Best Local SEO Practices

A Non-Profit’s Current Link Building Strategy

How Google’s algorithmic changes are forcing businesses to reassess their digital business model

37 Takeaways from SEOmoz Master Class in Bulgaria

Getting Adwords Qualified

Adjusting Paid Campaigns During a Recession

Search Engine Algorithm Basics

Ali JalilPour

Leave a Reply Cancel reply

Web hosting for SEO: Why it’s important

SEM career playbook: Overview of a growing industry

What Is SEO – Search Engine Optimization?

How SEOs and Content Writers Can Work Better Together — Whiteboard Friday

Ranking Factors Version 2 Released

Manage Client Expectations And Reduce Your Risk By Including A Warranty Disclaimer In Your Client Contracts

How I Develop Successful Link Building Strategies for My Clients

Optimizing for AI Overviews

My Top 5 Local SEO and Marketing Takeaways From MozCon 2024

IMC Stockholm, Swedish Meatballs and New Friends

What Is SEO – Search Engine Optimization?

Top SEO Tips for 2024 — Whiteboard Friday

	croquets	and	café	bitterballen	Amsterdam	…
Doc1	8	10	3	2	0
Doc2	1	20	3	9	2
DocN	…	…	…	…	…
Query	1	1	0	1	0

term	Query				Document			Product
	tf	df	idf	Wt,q	tf	Wf	Wt,d
Fiets	1	25.500.000	3.610493159	3.610493159	21	441	0.70711	2.55302
Kopen	1	118.000.000	2.945151332	2.9452	21	441	0.70711	2.08258
							Score:	4.6356

term	Query				Document			Product
	tf	df	idf	Wt,q	tf	Wf	Wt,d
Fiets	1	25.500.000	3.610493159	3.610493159	22	484	0.61782	2.23063
Kopen	1	118.000.000	2.945151332	2.945151332	28	784	0.78631	2.31584
							Score:	4.54647

term	Query				Document			Product
	tf	df	idf	Wt,q	tf	Wf	Wt,d
Fiets	1	25.500.000	3.610493159	3.610493159	28	784	0.78631	2.83897
Kopen	1	118.000.000	2.945151332	2.945151332	22	484	0.61782	1.81960
							Score:	4.6586

Subscribe to our mailing list to get the new updates!

Outsource Link Building like a Small SEO Company

Tracking Google Shopping Traffic with Google Analytics

Related Articles

Leave a Reply Cancel reply

Web hosting for SEO: Why it’s important

SEM career playbook: Overview of a growing industry

What Is SEO – Search Engine Optimization?

How SEOs and Content Writers Can Work Better Together — Whiteboard Friday

Ranking Factors Version 2 Released

Manage Client Expectations And Reduce Your Risk By Including A Warranty Disclaimer In Your Client Contracts

How I Develop Successful Link Building Strategies for My Clients

Optimizing for AI Overviews

My Top 5 Local SEO and Marketing Takeaways From MozCon 2024

IMC Stockholm, Swedish Meatballs and New Friends

What Is SEO – Search Engine Optimization?

Top SEO Tips for 2024 — Whiteboard Friday